24 research outputs found

    Stereoscopic video description for key-frame extraction in movie summarization

    Get PDF

    Quantifying the knowledge in Deep Neural Networks: an overview

    Get PDF
    Deep Neural Networks (DNNs) have proven to be extremely effective at learning a wide range of tasks. Due to their complexity and frequently inexplicable internal state, DNNs are difficult to analyze: their black-box nature makes it challenging for humans to comprehend their internal behavior. Several attempts to interpret their operation have been made during the last decade, but analyzing deep neural models from the perspective of the knowledge encoded in their layers is a very promising research direction, which has barely been touched upon. Such a research approach could provide a more accurate insight into a DNN model, its internal state, learning progress, and knowledge storage capabilities. The purpose of this survey is two-fold: a) to review the concept of DNN knowledge quantification and highlight it as an important near-future challenge, as well as b) to provide a brief account of the scant existing methods attempting to actually quantify DNN knowledge. Although a few such algorithms have been proposed, this is an emerging topic still under investigation

    Neural Natural Language Processing for Long Texts: A Survey of the State-of-the-Art

    Full text link
    The adoption of Deep Neural Networks (DNNs) has greatly benefited Natural Language Processing (NLP) during the past decade. However, the demands of long document analysis are quite different from those of shorter texts, while the ever increasing size of documents uploaded on-line renders automated understanding of long texts a critical area of research. This article has two goals: a) it overviews the relevant neural building blocks, thus serving as a short tutorial, and b) it surveys the state-of-the-art in long document NLP, mainly focusing on two central tasks: document classification and document summarization. Sentiment analysis for long texts is also covered, since it is typically treated as a particular case of document classification. Additionally, this article discusses the main challenges, issues and current solutions related to long document NLP. Finally, the relevant, publicly available, annotated datasets are presented, in order to facilitate further research.Comment: 53 pages, 2 figures, 171 citation

    Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics

    Get PDF
    Video summarization is a timely and rapidly developing research field with broad commercial interest, due to the increasing availability of massive video data. Relevant algorithms face the challenge of needing to achieve a careful balance between summary compactness, enjoyability, and content coverage. The specific case of stereoscopic 3D theatrical films has become more important over the past years, but not received corresponding research attention. In this paper, a multi-stage, multimodal summarization process for such stereoscopic movies is proposed, that is able to extract a short, representative video skim conforming to narrative characteristics from a 3D film. At the initial stage, a novel, low-level video frame description method is introduced (frame moments descriptor) that compactly captures informative image statistics from luminance, color, optical flow, and stereoscopic disparity video data, both in a global and in a local scale. Thus, scene texture, illumination, motion, and geometry properties may succinctly be contained within a single frame feature descriptor, which can subsequently be employed as a building block in any key-frame extraction scheme, e.g., for intra-shot frame clustering. The computed key-frames are then used to construct a movie summary in the form of a video skim, which is post-processed in a manner that also considers the audio modality. The next stage of the proposed summarization pipeline essentially performs shot pruning, controlled by a user-provided shot retention parameter, that removes segments from the skim based on the narrative prominence of movie characters in both the visual and the audio modalities. This novel process (multimodal shot pruning) is algebraically modeled as a multimodal matrix column subset selection problem, which is solved using an evolutionary computing approach. Subsequently, disorienting editing effects induced by summarization are dealt with, through manipulation of the video skim. At the last step, the skim is suitably post-processed in order to reduce stereoscopic video defects that may cause visual fatigue

    GreekPolitics: Sentiment Analysis on Greek Politically Charged Tweets

    Get PDF
    The rapid growth of on-line social media platforms has rendered opinion mining/sentiment analysis a critical area of research. This paper focuses on analyzing Twitter posts (tweets), written in the Greek language and politically charged in content. This is a rather underexplored topic, due to the inadequacy of publicly available annotated datasets. Thus, we present and release GreekPolitics: a dataset of Greek tweets with politically charged content, annotated for four different sentiments: polarity, figurativeness, aggressiveness and bias. GreekPolitics has been evaluated comprehensively using state-of-the-art Deep Neural Networks (DNNs) and data augmentation methods. This paper details the dataset, the evaluation process and the experimental results

    Summarization of human activity videos via low-rank approximation

    Get PDF

    Movie shot selection preserving narrative properties

    Get PDF

    Stereoscopic Medical Data Video Quality Issues

    Get PDF
    Stereoscopic medical videos are recorded, e.g., in stereo endoscopy or during video recording medical/dental operations. This paper examines quality issues in the recorded stereoscopic medical videos, as insufficient quality may induce visual fatigue to doctors. No attention has been paid to stereo quality and ensuing fatigue issues in the scientific literature so far. Two of the most commonly encountered quality issues in stereoscopic data, namely stereoscopic window violations and bent windows, were searched for in stereo endoscopic medical videos. Furthermore, an additional stereo quality issue encountered in dental operation videos, namely excessive disparity, was detected and fixed. The conducted experiments prove the existence of such quality issues in stereoscopic medical data and highlight the need for their detection and correction

    Deep Reinforcement Learning with semi-expert distillation for autonomous UAV cinematography

    Get PDF
    Unmanned Aerial Vehicles (UAVs, or drones) have revolutionized modern media production. Being rapidly deployable “flying cameras”, they can easily capture aesthetically pleasing aerial footage of static or moving filming targets/subjects. Current approaches rely either on manual UAV/gimbal control by human experts or on a combination of complex computer vision algorithms and hardware configurations for automating the flight+flying process. This paper explores an efficient Deep Reinforcement Learning (DRL) alternative, which implicitly merges the target detection and path planning steps into a single algorithm. To achieve this, a baseline DRL approach is augmented with a novel policy distillation component, which transfers knowledge from a suitable, semi-expert Model Predictive Control (MPC) controller into the DRL agent. Thus, the latter is able to autonomously execute a specific UAV cinematography task with purely visual input. Unlike the MPC controller, the proposed DRL agent does not need to know the 3D world position of the filming target during inference. Experiments conducted in a photorealistic simulator showcase superior performance and training speed compared to the baseline agent while surpassing the MPC controller in terms of visual occlusion avoidance
    corecore